Goto

Collaborating Authors

 Cross Validation



Cross-validation Confidence Intervals for Test Error Pierre Bayle

Neural Information Processing Systems

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k -fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k -fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.


approximate cross-validation (ACV) methods may be slow and inaccurate in GLM problems with high data dimension

Neural Information Processing Systems

We are grateful to the reviewers for their helpful feedback. And we provide an efficiently computable upper bound on the error of our ACV method. So, for (2), there is real interest in the full-covariate GLM. We will illustrate with an empirical comparison to the proposal in (1) in our revision. We searched for relevant papers by (e.g.) Noureddine El Karoui but did not find one.



computationally efficient and accurate method for approximate cross-validation (ACV) in the following setting: 2

Neural Information Processing Systems

We thank the reviewers for their helpful comments. We completely agree and will be sure to make this point very early in a revised manuscript. We will be sure to include both numbers and figures in a revision. We prove (Section 4) that the IJ approximation error increases smoothly with the error in the initial fit. Our neural CRF experiments in Section 5 provide empirical confirmation.


A Honest Cross-Validation Estimator for Prediction Performance

Pan, Tianyu, Yu, Vincent Z., Devanarayan, Viswanath, Tian, Lu

arXiv.org Machine Learning

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the model performance across different data splits. A well-known criticism is that such cross-validation procedure does not directly estimate the performance of the particular model recommended for future use. In this paper, we propose a new method to estimate the performance of a model trained on a specific (random) training set. A naive estimator can be obtained by applying the model to a disjoint testing set. Surprisingly, cross-validation estimators computed from other random splits can be used to improve this naive estimator within a random-effects model framework. We develop two estimators -- a hierarchical Bayesian estimator and an empirical Bayes estimator -- that perform similarly to or better than both the conventional cross-validation estimator and the naive single-split estimator. Simulations and a real-data example demonstrate the superior performance of the proposed method.


Technical note on Fisher Information for Robust Federated Cross-Validation

Khan, Behraj, Syed, Tahir Qasim

arXiv.org Machine Learning

When training data are fragmented across batches or federated-learned across different geographic locations, trained models manifest performance degradation. That degradation partly owes to covariate shift induced by data having been fragmented across time and space and producing dissimilar empirical training distributions. Each fragment's distribution is slightly different to a hypothetical unfragmented training distribution of covariates, and to the single validation distribution. To address this problem, we propose Fisher Information for Robust fEderated validation (\textbf{FIRE}). This method accumulates fragmentation-induced covariate shift divergences from the global training distribution via an approximate Fisher information. That term, which we prove to be a more computationally-tractable estimate, is then used as a per-fragment loss penalty, enabling scalable distribution alignment. FIRE outperforms importance weighting benchmarks by $5.1\%$ at maximum and federated learning (FL) benchmarks by up to $5.3\%$ on shifted validation sets.


Region-of-Interest Augmentation for Mammography Classification under Patient-Level Cross-Validation

Bigdeli, Farbod, Mohammadagha, Mohsen, Bigdeli, Ali

arXiv.org Artificial Intelligence

Breast cancer screening with mammography remains central to early detection and mortality reduction. Deep learning has shown strong potential for automating mammogram interpretation, yet limited-resolution datasets and small sample sizes continue to restrict performance. We revisit the Mini-DDSM dataset (9,684 images; 2,414 patients) and introduce a lightweight region-of-interest (ROI) augmentation strategy. During training, full images are probabilistically replaced with random ROI crops sampled from a precomputed, label-free bounding-box bank, with optional jitter to increase variability. We evaluate under strict patient-level cross-validation and report ROC-AUC, PR-AUC, and training-time efficiency metrics (throughput and GPU memory). Because ROI augmentation is training-only, inference-time cost remains unchanged. On Mini-DDSM, ROI augmentation (best: p_roi = 0.10, alpha = 0.10) yields modest average ROC-AUC gains, with performance varying across folds; PR-AUC is flat to slightly lower. These results demonstrate that simple, data-centric ROI strategies can enhance mammography classification in constrained settings without requiring additional labels or architectural modifications.



Regularization Path of Cross-Validation Error Lower Bounds

Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, Ichiro Takeuchi

Neural Information Processing Systems

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds . The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs.